A Brief Introduction & Application of RCpp Package (and its extensions): Making R Execution Faster

Muhammad Jaffri Mohd Nasir

2024-09-27

Recent post …

RCpp is not a popular package, but still usefull for many reasons.

Abstract

\(R\) language can be painfully slow when it comes to problems with large numbers of loop/iterations/recursions, accessing large elements in vectors/ matrices and applying mathematical operations. Accessing supercomputers to speed up \(R\) computational time can be impossible and costly. On the other hand, \(C\) language is known to be one of the fastest programming languages but might be overwhelming to code.

Introduced and maintained by Eddelbuettel et. al., the \(RCpp\) package provides easy integration and transition of objects between \(C\) and \(R\) languages through \(R\)’s application programming interface (API), thus mitigating bottlenecks and slowness in our code.

In this short talk, we will go through some basics of the \(RCpp\) that are absolutely necessary to help us mitigate bottlenecks in our code and introduce some of its useful extensions (e.g., RcppArmadillo). To motivate the audience, a few live applications are shown to demonstrate the speed of the \(RCpp\) package

Brief intro to C++ and RCpp (1)

Brief intro to C++ and RCpp (2)

Common variable types in C++
Common variable types in C++

Brief intro to C++ and RCpp (3)

C++ is useful, but with limitations. E.g., does not have matrices functions, only arrays.
C++ is useful, but with limitations. E.g., does not have matrices functions, only arrays.

Brief intro to C++ and RCpp (4)

Using C++ code in R (Two ways), one output

library(Rcpp)
# Attaching package: 'Rcpp'
# The following object is masked from 'package:inline':
#registerPlugin
cppFunction(code =
'NumericVector fiboCpp(int n) {
    NumericVector result(n);
    if (n > 0) result[0] = 0;
    if (n > 1) result[1] = 1;
    for (int i = 2; i < n; i++) {
    result[i] = result[i-1] + result[i-2];
    }
    return result;
}')
fiboCpp(100)
library(Rcpp)
sourceCpp("fiboCpp")
fiboCpp(100)

*File named “fiboCpp.cpp” with the “.cpp” extension is exported.

Example: Fibonacci Sequence

\[F_{n} = F_{n-1} + F_{n-2}\] for \(n>1\), with conventions \(F_{0} = 0\) and \(F_{1} = 1\). It has several practical usage. Codes below are generated by Microsoft Copilot.

R code RCpp code (fiboCpp.cpp)
fiboR <- function(n) {
  fib_seq <- numeric(n)
  fib_seq[1] <- 0
  if (n > 1) fib_seq[2] <- 1
  for (i in 3:n) {
    fib_seq[i] <- fib_seq[i-1] + fib_seq[i-2]
  }
  return(fib_seq)
}
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector fiboCpp(int n) {
  NumericVector fib_seq(n);
  if (n > 0) fib_seq[0] = 0;
  if (n > 1) fib_seq[1] = 1;
  for (int i = 2; i < n; i++) {
    fib_seq[i] = fib_seq[i-1] + fib_seq[i-2];
  }
  return fib_seq;
}

Benchmarking Fibonacci Sequence

Using R’s ‘rbenchmark’ package:

library(rbenchmark)
benchmark(fiboR(1e5), fiboCpp(1e5),
          columns=c("test", "replications", "elapsed", "relative"),
          order="relative",
          replications=1)

RCpp: More than one output

library(Rcpp)
cppFunction(code = 
'List fiboCpp(int n) {
    NumericVector result(n);
    double resultFibo;
    if (n > 0) result[0] = 0;
    if (n > 1) result[1] = 1;
    for (int i = 2; i < n; i++) {
    result[i] = result[i-1] + result[i-2];
    resultFibo = result[i];
    }
    return Rcpp::List::create(
           Rcpp::Named("FiboSeq")=result,
           Rcpp::Named("Fibo")=resultFibo);
}')
fiboCpp(100)

Note that

Enabling further C++ extension to R: Case of RCppArmadillo (1)

- Similar package: Eigen, RCppEigen

Enabling further C++ extension to R: Case of RCppArmadillo (2)

With Armadillo package, this enable us to include vector and matrices as variables via command “arma::vec” and “arma::mat”, respectively. For inline version, we have the following example:

library(Rcpp)
library(RcppArmadillo)
vecA<-c(1,2,3,4)
vecB<-c(4,5,6,7)
cppFunction(depends='RcppArmadillo', code = 
  'arma::mat MAT1(arma::vec veca, arma::vec vecb) {
    arma::mat C;
    C = veca*vecb.t();
    return C;
}')
MAT1(vecA, vecB) #This C++ command generates 4 X 4 matrix from the multiplication of two vectors
##      [,1] [,2] [,3] [,4]
## [1,]    4    5    6    7
## [2,]    8   10   12   14
## [3,]   12   15   18   21
## [4,]   16   20   24   28
as.matrix(vecA)%*%t(as.matrix(vecB)) #This R command converts two vectors to matrices then multiplies
##      [,1] [,2] [,3] [,4]
## [1,]    4    5    6    7
## [2,]    8   10   12   14
## [3,]   12   15   18   21
## [4,]   16   20   24   28

Enabling further C++ extension to R: Case of RCppArmadillo (3)

For external Cpp file, follow this command:
RCpp code
#include<RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat MAT1(arma::vec veca, arma::vec vecb) {
    arma::mat C;
    C = veca*vecb.t();
    return C;
}

Example: Estimating thresholds in time series (1)

Consider the following threshold autoregressive model: \[ \begin{split} y_{t}=\begin{cases} 1 -0.4 y_{t-1} +\varepsilon_{t},& \quad \text{if}\quad y_{t-1} \in (-\infty, -0.8],\\ 0.6 +y_{t-1}+\varepsilon_{t},& \quad \text{if}\quad y_{t-1} \in (-0.8, 0.5],\\ -1 -0.2y_{t-1} + \varepsilon_{t},& \quad \text{if}\quad y_{t-1} \in (0.5, \infty). \end{cases} \end{split} \\ \] with \(\varepsilon_{t}\sim N(0,1)\). The equation/system change as \(y_{t-1}\) breaches thresholds -0.8 and 0.5. We generate 1200 time points sequence starting \(t=1,\cdots,1200\).

We will utilizing two step threshold estimation approach, starting with group least angle regression (gLAR) algorithm and backward elimination algorithm (BEA).

We have two sets of code: - Fully R code - Hybrid of R and C++

For C++, RcppArmadillo is used because it involves a lot of importing and exporting vectors and matrices from R to C (and vice-versa).

Example: Estimating thresholds in time series (2)

Top lines: Example of Calculating SSE for BEA (Full codes are in Github)
R C++
COMPSSE<-function(Xo,Yo,Z,r1,r2,N){
for(l1 in 1:N){
     if(r1<Z[l1,1] && Z[l1,1]<=r2){
        I<-1
      }else{I<-0}
     YB<-as.matrix(Xo[l1,])
     if(l1==1){
        M1<-(YB%*%t(YB))*I
        M2<-(YB*Yo[l1,1])*I
     }else{
        M1<-M1+((YB%*%t(YB))*I)
        M2<-M2+((YB*Yo[l1,1])*I)
     }
 }
 BETAo<-ginv(M1)%*%M2
#include<RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List COMPSSE(arma::mat Xo, arma::mat Yo, 
  arma::vec Z, double r1, double r2, int N) {

arma::mat BETAo,SSE,YB,M1,M2,M3;
int I1,I2,COUNT;

  for(int L=0; L < N ; L++){
        if(r1 < Z[L] && Z[L]<=r2){
          I1=1;
        }else{I1=0;}
      if(L==0){COUNT=I1;}else{COUNT=COUNT+I1;}
      YB=Xo(span(L),span::all).t();
         if(L==0){
          M1=(YB*YB.t())*I1;
          M2=(YB*Yo(span(L),span::all))*I1;
         }else{
          M1=M1+((YB*YB.t())*I1);   
          M2=M2+((YB*Yo(span(L),span::all))*I1);
         }
     }
      BETAo=pinv(M1,0.01)*M2;

Example: Estimating thresholds in time series (3)

Running R codes and hybrid codes for benchmarking. Here, we are obtaining 10 threshold candidates using gLAR algorithm, then filter the candidates using BEA. Left side is using full R code, Right side is using hybrid of R and C++ through RCpp

Can we further speed up RCpp? (1)

Perhaps we can utilize our multiple processing cores/treads. Meet OpenMP. Consider the following example of approximating Pi using Madhava–Leibniz series: \[ \pi = 4(1 - 1/3+1/5 - 1/7 + 1/9 - \cdots)=4\sum_{k=0}^{\infty}(-1)^{k}/(2k+1)\]
Serial Parallel
#include<Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
  
double compute_pi_serial(long num_steps) 
{ 
    double sum = 0.0; 
    for (long i = 0; i < num_steps; i++) { 
        sum += pow(-1,i)/((2*i)+1); 
    } 
    return 4*sum; 
} 
#include<Rcpp.h>
#include <omp.h>
using namespace Rcpp;
// [[Rcpp::export]]
  
double compute_pi_parallel(long num_steps) 
{ 
    double sum = 0.0; 
    // parallelize loop and reduce sum variable
#pragma omp parallel for reduction(+ : sum) 
    for (long i = 0; i < num_steps; i++) { 
        sum += pow(-1,i)/((2*i)+1); 
    } 
    return 4*sum;
} 

Easier to enable OpenMP via Linux OS (e.g., Ubuntu, Linux Mint) by running the command “Sys.setenv(”PKG_CXXFLAGS” = “-fopenmp”)” in R Console.

Can we further speed up RCpp? (2)

Benchmarking: Serial vs Parralel

RCpp Resources

.

Thank You. Prepared using